15 research outputs found

    Paradigm of tunable clustering using binarization of consensus partition matrices (Bi-CoPaM) for gene discovery

    Get PDF
    Copyright @ 2013 Abu-Jamous et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.Clustering analysis has a growing role in the study of co-expressed genes for gene discovery. Conventional binary and fuzzy clustering do not embrace the biological reality that some genes may be irrelevant for a problem and not be assigned to a cluster, while other genes may participate in several biological functions and should simultaneously belong to multiple clusters. Also, these algorithms cannot generate tight clusters that focus on their cores or wide clusters that overlap and contain all possibly relevant genes. In this paper, a new clustering paradigm is proposed. In this paradigm, all three eventualities of a gene being exclusively assigned to a single cluster, being assigned to multiple clusters, and being not assigned to any cluster are possible. These possibilities are realised through the primary novelty of the introduction of tunable binarization techniques. Results from multiple clustering experiments are aggregated to generate one fuzzy consensus partition matrix (CoPaM), which is then binarized to obtain the final binary partitions. This is referred to as Binarization of Consensus Partition Matrices (Bi-CoPaM). The method has been tested with a set of synthetic datasets and a set of five real yeast cell-cycle datasets. The results demonstrate its validity in generating relevant tight, wide, and complementary clusters that can meet requirements of different gene discovery studies.National Institute for Health Researc

    A new scoring system in Cystic Fibrosis: statistical tools for database analysis – a preliminary report

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Cystic fibrosis is the most common fatal genetic disorder in the Caucasian population. Scoring systems for assessment of Cystic fibrosis disease severity have been used for almost 50 years, without being adapted to the milder phenotype of the disease in the 21<sup>st </sup>century. The aim of this current project is to develop a new scoring system using a database and employing various statistical tools. This study protocol reports the development of the statistical tools in order to create such a scoring system.</p> <p>Methods</p> <p>The evaluation is based on the Cystic Fibrosis database from the cohort at the Royal Children's Hospital in Melbourne. Initially, unsupervised clustering of the all data records was performed using a range of clustering algorithms. In particular incremental clustering algorithms were used. The clusters obtained were characterised using rules from decision trees and the results examined by clinicians. In order to obtain a clearer definition of classes expert opinion of each individual's clinical severity was sought. After data preparation including expert-opinion of an individual's clinical severity on a 3 point-scale (mild, moderate and severe disease), two multivariate techniques were used throughout the analysis to establish a method that would have a better success in feature selection and model derivation: 'Canonical Analysis of Principal Coordinates' and 'Linear Discriminant Analysis'. A 3-step procedure was performed with (1) selection of features, (2) extracting 5 severity classes out of a 3 severity class as defined per expert-opinion and (3) establishment of calibration datasets.</p> <p>Results</p> <p>(1) Feature selection: CAP has a more effective "modelling" focus than DA.</p> <p>(2) Extraction of 5 severity classes: after variables were identified as important in discriminating contiguous CF severity groups on the 3-point scale as mild/moderate and moderate/severe, Discriminant Function (DF) was used to determine the new groups mild, intermediate moderate, moderate, intermediate severe and severe disease. (3) Generated confusion tables showed a misclassification rate of 19.1% for males and 16.5% for females, with a majority of misallocations into adjacent severity classes particularly for males.</p> <p>Conclusion</p> <p>Our preliminary data show that using CAP for detection of selection features and Linear DA to derive the actual model in a CF database might be helpful in developing a scoring system. However, there are several limitations, particularly more data entry points are needed to finalize a score and the statistical tools have further to be refined and validated, with re-running the statistical methods in the larger dataset.</p

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    Get PDF
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe

    Susceptibility of Msh2 -deficient mice to inflammation-associated colorectal tumors

    No full text
    Patients with longstanding extensive ulcerative colitis have an increased risk of developing colorectal cancer (CRC). There are significant differences in the early pathogenesis of colitis-associated tumors compared with common CRC, whereas the frequency, degree, and significance of microsatellite instability (MSI) as a marker of mismatch repair deficiency in colitis tumors remain unclear. Here we describe the application of the DSS model of chronic colitis to mice with a defect in the Msh2 mismatch repair gene to discern these early events. These mice do not develop CRC spontaneously without an external trigger. The aim of this study was to determine the effect of the Msh2 defect on the frequency and grade of colitis-associated colorectal dysplasia and adenocarcinoma in Msh2-/-, Msh2+/-, and wild-type (Msh2+/+) mice and on the MSI status of the tumors. We show that in mice with chronic colitis, 60% of the Msh2-/- and 29% of the wild-type mice developed high-grade dysplasia or adenocarcinoma, but heterozygosity for the Msh2 defect did not increase tumor susceptibility over wild-type genotype. The largest difference between genotypes was in the frequency of high-grade dysplasia, with 46.7, 8, and 12.5% in Msh2-/-, Msh2+/-, and Msh2+/+ mice, respectively. The Msh2-/- mice developed MSI-high tumors, whereas the majority of the Msh2+/- and wild-type tumors had no MSI. In the Msh2-/- mice, MSI appeared early in non-neoplastic colon tissue, presumably as a result of markedly increased epithelial cell proliferation associated with inflammation. These observations suggest that a homozygous mismatch repair defect predisposes to tumors triggered by chronic inflammation but is not the only factor involved because tumors also developed in the wild-type mice. This model of colitis offers opportunities to characterize the different molecular pathways of carcinogenesis operating in chronic colitis

    Two mismatch repair gene mutations found in a colon cancer patient - which one is pathogenic?

    No full text
    Hereditary nonpolyposis colorectal cancer (HNPCC) is a dominantly inherited cancer syndrome. Germline mutations in five different mismatch repair (MMR) genes, MSH2, MSH6, MLHI, MLH3, and PMS2 are linked to HNPCC. Here, we describe two colon cancer families in which the index patients carry missense mutations in both MSH2 and MSH6. The MSH2 mutation, 1145M, is the same in both families, whereas the MSH6 mutations are different (R1095H and L1354Q). The families do not fulfil the international criteria for HNPCC, one family comprising two and the other family four colon cancer patients, all in one generation, resembling a recessive rather than dominant inheritance characteristic of HNPCC. The tumors of the index patients showed microsatellite instability. Functional analysis was performed to determine which one of the mutations could primarily underlie the cancer susceptibility in the families. MSH2 and MSH6 are known to form a heterodimeric complex (MutS(x) responsible for mismatch recognition. The interaction of each mutated protein with its wild-type partner and with its mutated partner present in the colon cancer patient, and the MMR function of the mutated MutS(x complexes were determined. Since none of the three mutations affected the MSH2-MSH6 interaction or the function of MutS(x in an in-vitro MMR assay, our results suggest that alone the mutations do not cause MMR deficiency typical of HNPCC. However, our results do not exclude the possible compound pathogenicity of the two mutations
    corecore